Line Search
   HOME

TheInfoList



OR:

In
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
, line search is a basic
iterative Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
approach to find a
local minimum In mathematical analysis, the maximum and minimum of a function are, respectively, the greatest and least value taken by the function. Known generically as extremum, they may be defined either within a given range (the ''local'' or ''relative ...
\mathbf^* of an
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
f:\mathbb R^n\to\mathbb R. It first finds a descent direction along which the objective function f will be reduced, and then computes a step size that determines how far \mathbf should move along that direction. The descent direction can be computed by various methods, such as
gradient descent Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function. The idea is to take repeated steps in the opposite direction of the gradi ...
or quasi-Newton method. The step size can be determined either exactly or inexactly.


One-dimensional line search

Suppose ''f'' is a one-dimensional function, f:\mathbb R\to\mathbb R, and assume that it is
unimodal In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object. Unimodal probability distribution In statistics, a unimodal p ...
, that is, contains exactly one local minimum ''x''* in a given interval 'a'',''z'' This means that ''f'' is strictly decreasing in ,x*and strictly increasing in *,''z'' There are several ways to find an (approximate) minimum point in this case.


Zero-order methods

Zero-order methods use only function evaluations (i.e., a value oracle) - not derivatives: *
Ternary search A ternary search algorithm is a technique in computer science for finding the minimum or maximum of a unimodal function. The function Assume we are looking for a maximum of f(x) and that we know the maximum lies somewhere between A and B. Fo ...
: pick some two points ''b,c'' such that ''a''<''b''<''c''<''z''. If f(''b'')≤f(''c''), then x* must be in 'a'',''c'' if f(''b'')≥f(''c''), then x* must be in 'b'',''z'' In both cases, we can replace the search interval with a smaller one. If we pick ''b'',''c'' very close to the interval center, then the interval shrinks by ~1/2 at each iteration, but we need two function evaluations per iteration. Therefore, the method has
linear convergence In mathematical analysis, particularly numerical analysis, the rate of convergence and order of convergence of a sequence that converges to a limit are any of several characterizations of how quickly that sequence approaches its limit. These are ...
with rate \sqrt\approx 0.71. If we pick b,c such that the partition a,b,c,z has three equal-length intervals, then the interval shrinks by 2/3 at each iteration, so the method has
linear convergence In mathematical analysis, particularly numerical analysis, the rate of convergence and order of convergence of a sequence that converges to a limit are any of several characterizations of how quickly that sequence approaches its limit. These are ...
with rate \sqrt\approx 0.82. * Fibonacci search: This is a variant of ternary search in which the points ''b'',''c'' are selected based on the
Fibonacci sequence In mathematics, the Fibonacci sequence is a Integer sequence, sequence in which each element is the sum of the two elements that precede it. Numbers that are part of the Fibonacci sequence are known as Fibonacci numbers, commonly denoted . Many w ...
. At each iteration, only one function evaluation is needed, since the other point was already an endpoint of a previous interval. Therefore, the method has linear convergence with rate 1/ \varphi \approx 0.618 . * Golden-section search: This is a variant in which the points ''b'',''c'' are selected based on the
golden ratio In mathematics, two quantities are in the golden ratio if their ratio is the same as the ratio of their summation, sum to the larger of the two quantities. Expressed algebraically, for quantities and with , is in a golden ratio to if \fr ...
. Again, only one function evaluation is needed in each iteration, and the method has linear convergence with rate 1/ \varphi \approx 0.618 . This ratio is optimal among the zero-order methods. Zero-order methods are very general - they do not assume differentiability or even continuity.


First-order methods

First-order methods assume that ''f'' is continuously differentiable, and that we can evaluate not only ''f'' but also its derivative. * The
bisection method In mathematics, the bisection method is a root-finding method that applies to any continuous function for which one knows two values with opposite signs. The method consists of repeatedly bisecting the interval defined by these values and t ...
computes the derivative of ''f'' at the center of the interval, ''c'': if f'(c)=0, then this is the minimum point; if f'(''c'')>0, then the minimum must be in 'a'',''c'' if f'(''c'')<0, then the minimum must be in 'c'',''z'' This method has linear convergence with rate 0.5.


Curve-fitting methods

Curve-fitting methods try to attain superlinear convergence by assuming that ''f'' has some analytic form, e.g. a polynomial of finite degree. At each iteration, there is a set of "working points" in which we know the value of ''f'' (and possibly also its derivative). Based on these points, we can compute a polynomial that fits the known values, and find its minimum analytically. The minimum point becomes a new working point, and we proceed to the next iteration: *
Newton's method In numerical analysis, the Newton–Raphson method, also known simply as Newton's method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a ...
is a special case of a curve-fitting method, in which the curve is a degree-two polynomial, constructed using the first and second derivatives of ''f''. If the method is started close enough to a non-degenerate local minimum (= with a positive second derivative), then it has
quadratic convergence In mathematical analysis, particularly numerical analysis, the rate of convergence and order of convergence of a sequence that converges to a limit are any of several characterizations of how quickly that sequence approaches its limit. These are ...
. * Regula falsi is another method that fits the function to a degree-two polynomial, but it uses the first derivative at two points, rather than the first and second derivative at the same point. If the method is started close enough to a non-degenerate local minimum, then it has superlinear convergence of order \varphi \approx 1.618. * ''Cubic fit'' fits to a degree-three polynomial, using both the function values and its derivative at the last two points. If the method is started close enough to a non-degenerate local minimum, then it has
quadratic convergence In mathematical analysis, particularly numerical analysis, the rate of convergence and order of convergence of a sequence that converges to a limit are any of several characterizations of how quickly that sequence approaches its limit. These are ...
. Curve-fitting methods have superlinear convergence when started close enough to the local minimum, but might diverge otherwise. ''Safeguarded curve-fitting methods'' simultaneously execute a linear-convergence method in parallel to the curve-fitting method. They check in each iteration whether the point found by the curve-fitting method is close enough to the interval maintained by safeguard method; if it is not, then the safeguard method is used to compute the next iterate.


Multi-dimensional line search

In general, we have a multi-dimensional
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
f:\mathbb R^n\to\mathbb R. The line-search method first finds a descent direction along which the objective function f will be reduced, and then computes a step size that determines how far \mathbf should move along that direction. The descent direction can be computed by various methods, such as
gradient descent Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function. The idea is to take repeated steps in the opposite direction of the gradi ...
or quasi-Newton method. The step size can be determined either exactly or inexactly. Here is an example gradient method that uses a line search in step 5: # Set iteration counter k=0 and make an initial guess \mathbf_0 for the minimum. Pick \epsilon a tolerance. # Loop: ## Compute a descent direction \mathbf_k. ## Define a one-dimensional function h(\alpha_k)=f(\mathbf_k+\alpha_k\mathbf_k), representing the function value on the descent direction given the step-size. ## Find an \displaystyle \alpha_k that minimizes h over \alpha_k\in\mathbb R_+. ## Update \mathbf_=\mathbf_k+\alpha_k\mathbf_k, and k=k+1 # Until \, \nabla f(\mathbf_)\, <\epsilon At the line search step (2.3), the algorithm may minimize ''h'' ''exactly'', by solving h'(\alpha_k)=0, or ''approximately'', by using one of the one-dimensional line-search methods mentioned above. It can also be solved ''loosely'', by asking for a sufficient decrease in ''h'' that does not necessarily approximate the optimum. One example of the former is
conjugate gradient method In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-semidefinite. The conjugate gradient method is often implemented as an it ...
. The latter is called inexact line search and may be performed in a number of ways, such as a backtracking line search or using the Wolfe conditions.


Overcoming local minima

Like other optimization methods, line search may be combined with
simulated annealing Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. ...
to allow it to jump over some local minima.


See also

* Trust region - a dual approach for finding a local minimum: it first computes a step size, and then determines the descent direction. * Grid search * Learning rate *
Pattern search (optimization) Pattern search (also known as direct search, derivative-free search, or black-box search) is a family of numerical optimization methods that does not require a gradient. As a result, it can be used on functions that are not continuous or diffe ...
* Secant method


References


Further reading

* * * {{DEFAULTSORT:Line Search Optimization algorithms and methods